8  Week 3 tutorial 2 - AI 4 Chemistry

Open In Colab

Table of content

  1. Relevant packages
  2. Train GNNs using chemprop

0. Relevant packages

Chemprop

Chemprop package contains message passing neural networks for molecular property prediction as described in the paper Analyzing Learned Molecular Representations for Property Prediction and as used in the paper A Deep Learning Approach to Antibiotic Discovery for molecules and Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction for reactions.

Documentation: Full documentation of Chemprop is available at https://chemprop.readthedocs.io/en/latest/.

Website: A web prediction interface with some trained Chemprop models is available at chemprop.csail.mit.edu.

Tutorial: These slides provide a Chemprop tutorial and highlight recent additions as of April 28th, 2020.

# Install chemprop
# CoLab has already preinstalled Pytorch for you
! pip install chemprop rdkit

# Download ESOL data
! mkdir data/
! wget https://raw.githubusercontent.com/schwallergroup/ai4chem_course/main/notebooks/02%20-%20Supervised%20Learning/data/esol.csv -O data/esol.csv

Set a random seed to ensure repeatability of experiments

import random
import numpy as np
import torch

# Random Seeds and Reproducibility
torch.manual_seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0)
random.seed(0)

1. Train GNNs using chemprop

To train a GNN model, run:

chemprop_train --data_path <path> --dataset_type <type> --save_dir <dir>

where <path> is the path to a CSV file containing a dataset, <type> is one of [classification, regression, multiclass, spectra] depending on the type of the dataset, and <dir> is the directory where train results and model checkpoints will be saved. For more details for CSV data style, please see here.

For example:

chemprop_train --data_path data/tox21.csv --dataset_type classification --save_dir tox21_checkpoints

A full list of available command-line arguments can be found in chemprop/args.py.

For model evaluation metrics, please see in README.md.

! chemprop_train --data_path data/esol.csv \
                 --dataset_type regression \
                 --save_dir esol_ckpts \
                 --metric rmse \
                 --split_sizes 0.7 0.1 0.2 \
                 --epochs 60